16 research outputs found

    Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark

    Get PDF
    Collaborative filtering based recommender systems use information about a user\u27s preferences to make personalized predictions about content, such as topics, people, or products, that they might find relevant. As the volume of accessible information and active users on the Internet continues to grow, it becomes increasingly difficult to compute recommendations quickly and accurately over a large dataset. In this study, we will introduce an algorithmic framework built on top of Apache Spark for parallel computation of the neighborhood-based collaborative filtering problem, which allows the algorithm to scale linearly with a growing number of users. We also investigate several different variants of this technique including user and item-based recommendation approaches, correlation and vector-based similarity calculations, and selective down-sampling of user interactions. Finally, we provide an experimental comparison of these techniques on the MovieLens dataset consisting of 10 million movie ratings

    Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States

    Get PDF
    Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks

    The United States COVID-19 Forecast Hub dataset

    Get PDF
    Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Effects of Anacetrapib in Patients with Atherosclerotic Vascular Disease

    Get PDF
    BACKGROUND: Patients with atherosclerotic vascular disease remain at high risk for cardiovascular events despite effective statin-based treatment of low-density lipoprotein (LDL) cholesterol levels. The inhibition of cholesteryl ester transfer protein (CETP) by anacetrapib reduces LDL cholesterol levels and increases high-density lipoprotein (HDL) cholesterol levels. However, trials of other CETP inhibitors have shown neutral or adverse effects on cardiovascular outcomes. METHODS: We conducted a randomized, double-blind, placebo-controlled trial involving 30,449 adults with atherosclerotic vascular disease who were receiving intensive atorvastatin therapy and who had a mean LDL cholesterol level of 61 mg per deciliter (1.58 mmol per liter), a mean non-HDL cholesterol level of 92 mg per deciliter (2.38 mmol per liter), and a mean HDL cholesterol level of 40 mg per deciliter (1.03 mmol per liter). The patients were assigned to receive either 100 mg of anacetrapib once daily (15,225 patients) or matching placebo (15,224 patients). The primary outcome was the first major coronary event, a composite of coronary death, myocardial infarction, or coronary revascularization. RESULTS: During the median follow-up period of 4.1 years, the primary outcome occurred in significantly fewer patients in the anacetrapib group than in the placebo group (1640 of 15,225 patients [10.8%] vs. 1803 of 15,224 patients [11.8%]; rate ratio, 0.91; 95% confidence interval, 0.85 to 0.97; P=0.004). The relative difference in risk was similar across multiple prespecified subgroups. At the trial midpoint, the mean level of HDL cholesterol was higher by 43 mg per deciliter (1.12 mmol per liter) in the anacetrapib group than in the placebo group (a relative difference of 104%), and the mean level of non-HDL cholesterol was lower by 17 mg per deciliter (0.44 mmol per liter), a relative difference of -18%. There were no significant between-group differences in the risk of death, cancer, or other serious adverse events. CONCLUSIONS: Among patients with atherosclerotic vascular disease who were receiving intensive statin therapy, the use of anacetrapib resulted in a lower incidence of major coronary events than the use of placebo. (Funded by Merck and others; Current Controlled Trials number, ISRCTN48678192 ; ClinicalTrials.gov number, NCT01252953 ; and EudraCT number, 2010-023467-18 .)

    LET\u27S GET DIRTY! Mini Baja SAE Project

    No full text
    Matthew Suchyna, Matthew Minchen, Seth Britton, Ryan Smoral, Grant Guzek, Adam Glaser, Casey Willer, Evan Shagott, Jerome Allen, Nick Walker, Josh Schaner and Shaun Payne, ENT 422: Machine Design II Faculty Mentor: Professor Jikai Du, Engineering Technology For the Mini Baja team’s Senior Project, the group was tasked with building a Mini Baja car. The vehicle must meet the requirements laid down by the SAE International group, who will be judging our fully finished car based on a strict set of guidelines. The team’s goal was to build a car that passes its rigorous inspection. This means there was much at stake as we built this car. Once the car has been fully built, our team will take it to a competition that is attended to by various teams all across the country. There, we will be judged on how our car performs in a series of obstacle courses, culminating with a final endurance race, going head-to-head against cars from every state, with the winner crowned as champions. This feat is not to be taken lightly, for the competition is fierce, with multiple skilled racers. We strive to be able to put together a car that we can shower with champagne and go down in the annals of Buffalo State engineering history.https://digitalcommons.buffalostate.edu/srcc-sp20-compeng/1019/thumbnail.jp
    corecore